16 research outputs found

    Highly Efficient Twin Module Structure of 64-Bit Exponential Function Implemented on SGI RASC Platform

    Get PDF
    This paper presents an implementation of the double precision exponential function. A novel table-based architecture, together with short Taylor expansion, provides a low latency (30 clock cycles) which is comparable to 32 bit implementations. A low area consumption of a single exp() module (roughtly 4% of XC4LX200) allows that several modules can be implemented in a single FPGAs.The employment of massive parallelism results in high performance of the module. Nevertheless, because of the external memory interface limitation, only a twin module structure is presented in this paper. This implementation aims primarily to meet quantum chemistry huge and strict requirements for precision and speed. Each module is capable of processing at speed of 200MHz with max. error of 1 ulp, RMSE equals 0.6

    The Algorithms for FPGA Implementation of Sparse Matrices Multiplication

    Get PDF
    In comparison to dense matrices multiplication, sparse matrices multiplication real performance for CPU is roughly 5--100 times lower when expressed in GFLOPs. For sparse matrices, microprocessors spend most of the time on comparing matrices indices rather than performing floating-point multiply and add operations. For 16-bit integer operations, like indices comparisons, computational power of the FPGA significantly surpasses that of CPU. Consequently, this paper presents a novel theoretical study how matrices sparsity factor influences the indices comparison to floating-point operation workload ratio. As a result, a novel FPGAs architecture for sparse matrix-matrix multiplication is presented for which indices comparison and floating-point operations are separated. We also verified our idea in practice, and the initial implementations results are very promising. To further decrease hardware resources required by the floating-point multiplier, a reduced width multiplication is proposed in the case when IEEE-754 standard compliance is not required

    Flow caching effectiveness in packet forwarding applications

    Get PDF
    Routing algorithms are known to be a potential bottleneck for packet processing. Network flow caching can function as a genneral acceleration technique for packet processing workloads. The goal of this article is to evaluate the effectiveness of packet flow caching techniques in high-speed networks. The area of focus is data distribution characteristics that lead to effectiveness of caching of network flows (connections). Based on statistical analysis and simulations the article sets necessary conditions for effective use of caches in packet forwarding applications. Public domain network traces were examined and measured for data locality. Software simulations show a strong correlation between flow packet distance metric and cache hit rate

    Analysis of the Basic Implementation Aspects of Hardware-Accelerated Density Functional Theory Calculations

    Get PDF
    This paper presents a Field Programmable Gate Array (FPGA) implementation of a calculation module for exponential part of Gaussian Type Orbital (GTO). The module is composed of several specially crafted floating-point modules which are fully pipelined and optimized for high performance. The hardware implementation revealed significant speed-up for the finite sum of the exponential products calculation ranging from 2.5x to 20x in comparison to a general-purpose Central Processing Unit (CPU) version. Calculating values of GTOs is one of computationally critical parts of the Kohn-Sham algorithm. The approach proposed in the paper aims to increase the performance of a part of the quantum chemistry computational system by employing FPGA-based accelerator. Several issues are addressed, such as identification of code fragments which benefit most from hardware acceleration, porting a part of the Kohn-Sham algorithm to FPGA, data precision adjustment and data transfer overhead. The authors' intention was also to make hardware implementation of calculating the orbital function universal and easily attachable to different quantum-chemistry software packages

    USING STANDARD HARDWARE ACCELERATORS TO DECREASE COMPUTATION TIMES IN SCIENTIFIC APPLICATIONS

    Get PDF
    Nowadays, general-purpose processors are being used in scientific computing. However, whenhigh computational throughput is needed, it’s worth to think it over if dedicated hardwaresolutions would be more efficient, either in terms of performance (or performance to price ratio),or in terms of power efficiency, or both. This paper describes them briefly and comparesto contemporary general-purpose processors’ architecture

    Evaluation and Implementation of n-Gram-Based Algorithm for Fast Text Comparison

    Get PDF
    This paper presents a study of an n-gram-based document comparison method. The method is intended to build a large-scale plagiarism detection system. The work focuses not only on an efficiency of the text similarity extraction but also on the execution performance of the implemented algorithms. We took notice of detection performance, storage requirements and execution time of the proposed approach. The obtained results show the trade-offs between detection quality and computational requirements. The GPGPU and multi-CPU platforms were considered to implement the algorithms and to achieve good execution speed. The method consists of two main algorithms: a document's feature extraction and fast text comparison. The winnowing algorithm is used to generate a compressed representation of the analyzed documents. The authors designed and implemented a dedicated test framework for the algorithm. That allowed for the tuning, evaluation, and optimization of the parameters. Well-known metrics (e.g. precision, recall) were used to evaluate detection performance. The authors conducted the tests to determine the performance of the winnowing algorithm for obfuscated and unobfuscated texts for a different window and n-gram size. Also, a simplified version of the text comparison algorithm was proposed and evaluated to reduce the computational complexity of the text comparison process. The paper also presents GPGPU and multi-CPU implementations of the algorithms for different data structures. The implementation speed was tested for different algorithms' parameters and the size of data. The scalability of the algorithm on multi-CPU platforms was verified. The authors of the paper provide the repository of software tools and programs used to perform the conducted experiments.he appropriate fast document comparison system. Its performance is given in the paper

    Experiment on Methods for Clustering and Categorization of Polish Text

    Get PDF
    The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation

    COMPUTATION ACCELERATION ON SGI RASC: FPGA BASED RECONFIGURABLE COMPUTING HARDWARE

    Get PDF
    In this paper a novel method of computation using FPGA technology is presented. In severalcases this method provides a calculations speedup with respect to the General PurposeProcessors (GPP). The main concept of this approach is based on such a design of computinghardware architecture to fit algorithm dataflow and best utilize well known computingtechniques as pipelining and parallelism. Configurable hardware is used as a implementationplatform for custom designed hardware. Paper will present implementation results ofalgorithms those are used in such areas as cryptography, data analysis and scientific computation.The other promising areas of new technology utilization will also be mentioned,bioinformatics for instance. Mentioned algorithms were designed, tested and implemented onSGI RASC platform. RASC module is a part of Cyfronet’s SGI Altix 4700 SMP system. Wewill also present RASC modern architecture. In principle it consists of FPGA chips and veryfast, 128-bit wide local memory. Design tools avaliable for designers will also be presented

    Using simulation to calibrate real data acquisition in veterinary medicine

    Full text link
    This paper explores the innovative use of simulation environments to enhance data acquisition and diagnostics in veterinary medicine, focusing specifically on gait analysis in dogs. The study harnesses the power of Blender and the Blenderproc library to generate synthetic datasets that reflect diverse anatomical, environmental, and behavioral conditions. The generated data, represented in graph form and standardized for optimal analysis, is utilized to train machine learning algorithms for identifying normal and abnormal gaits. Two distinct datasets with varying degrees of camera angle granularity are created to further investigate the influence of camera perspective on model accuracy. Preliminary results suggest that this simulation-based approach holds promise for advancing veterinary diagnostics by enabling more precise data acquisition and more effective machine learning models. By integrating synthetic and real-world patient data, the study lays a robust foundation for improving overall effectiveness and efficiency in veterinary medicine

    Crowdsourcing hypothesis tests: Making transparent how design choices shape research results

    Get PDF
    To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer fiveoriginal research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams renderedstatistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.</div
    corecore